Method for encoding a structured document

ABSTRACT

A method is for encoding a structured document, particularly an XML document, during which the contents of the document are converted into a binary representation. This binary representation is divided into encoding units, which form an encoded data flow and can be read out from the encoded data flow. The encoded data flow contains configuration data, with which configuration information concerning the division of the binary representation into encoding units can be read out before the reading out of one of more encoding units.

This application is the national phase under 35U.S.C. §371 of PCTInternational Application No. PCT/EP2004/001992 which has anInternational filing date of Feb. 27, 2004, which designated the UnitedStates of America and which claims priority on German Patent Applicationnumber DE 103 09 336.2 filed Mar. 4, 2003, the entire contents of whichare hereby incorporated herein by reference.

FIELD

The invention generally relates to a method for encoding a structureddocument, a decoding method and/or a corresponding encoding and/ordecoding device. For example, it relates to one in which a binaryrepresentation of a structured, in particular XML-based document(XML=Extensible Markup Language), is encoded and/or decoded with the aidof a scheme.

BACKGROUND

Encoding and decoding methods are described for example in publicationsconcerning the MPEG-7 standard, in particular in document [1]. Thesemethods allow the contents of the document, in particular elementsand/or attributes and/or data types, to be determined with the aid ofbit patterns in an encoded data flow. In this case, the encoded contentsare stored in so-called FUU's (FUU—fragment update unit), in which theentire content of the element and/or attribute and/or data type need notbe contained in the FUU. Parts of this element and/or attribute and/ordata type can be encoded in subsequent FUU's.

The content of XML documents is frequently further processed by arecipient, and prepared for example for display. For this purpose, it isoften the case that only specific elements and/or attributes and/or datatypes are filtered out from the XML document. The process of filteringcan be specified for instance in a so-called XSLT (XSLT=SML style sheetlanguage transformation).

According to the prior art, it has proven disadvantageous inapplications for processing an XML document that in order to filter outcontents, the whole document is decoded from the bit flow and is onlythen filtered. The filtering can be accelerated by way of technologiesknown from the prior art such that FUU's, which cannot contain thecontent to be filtered as a result of the information contained in theso-called context path of the FUU, are not decoded. It is however notpossibly to reliably determine, on the basis of the context path, whichFUU's actually contain the desired content.

SUMMARY

An object of at least one embodiment of the invention is to create amethod for encoding a structured document,.which enables a more simpleand rapid filtering of contents from the document.

With the method according to at least one embodiment of the inventionfor encoding a structured document, in particular an XML document, thecontents of the document are converted into a binary representation. Thebinary representation is divided into encoding units, which form anencoded data flow, it being possible to read out the encoded units fromthe encoded data flow. The encoded data flow thus contains configurationdata, with which configuration information concerning the division ofthe binary representation into encoding units can be read out before oneor more encoding units are read out.

Therefore, in order to filter out specific contents from the document,it is no longer necessary to decode the entire encoded data flow.Instead, it is already possible to determine from the encoded data flow,which contents the individual encoding units contain. The filtering of astructured document can thus be significantly accelerated.

In at least one example embodiment of the invention, the configurationinformation, particularly information concerning missing contents, is inpredetermined encoding units. It is thus possible to determine from theencoded data flow, which contents are missing in an encoding unit. Thus,there is no need to decode this encoding unit if searches are madeduring filtering for precisely this missing content.

In at least one further example embodiment, the configuration data isitself encoded in the encoded data flow, as a result of which theencoding efficiency is increased.

In one configuration of at least one example embodiment of theinvention, the configuration data is the configuration information, thisconfiguration information being added to the encoded data flow. Inparticular, the configuration information can be textually encoded inthe form of an XML document. Alternatively, the configurationinformation can be encoded using an MPEG encoding method.

In at least one example embodiment, the configuration data includesreferences to configuration information, with which configurationinformation is selected from stored configuration information. Theentire configuration information need no longer be transmitted. Instead,this information can be stored in a storage area, which can be accessedby the decoder.

The document to be encoded is preferably an MPEG description flow, inparticular an MPEG-7 or MPEG-21 description flow, the encoding unitsbeing fragment update units which in turn form access units. Adescription of the encoding standard MPEG-21 can be found in document[2] for instance. The stored configuration information is preferablycontained in profiles of an MPEG standard, in particular of the MPEG-7or the MPEG-21 standard.

In at least one particular example embodiment, the structured documentis an XML document including elements and/or attributes and/or datatypes. If the configuration information is information concerningmissing contents, the missing contents particularly include at least oneelement and/or one attribute and/or one data type.

In addition to the above-described example embodiments of a method forencoding a data flow, at least one additional example embodiment of theinvention further includes a method for decoding an encoded data flow,the method being designed such that a data flow encoded with theencoding method according to at least one embodiment of the invention isdecoded. In this case, the configuration information may be, forexample, read out from the encoded data flow.

Furthermore, at least one example embodiment of the invention relates toa method for encoding and/or decoding a data flow including theabove-described encoding method according to at least one exampleembodiment of the invention and/or the above-mentioned decoding methodaccording to at least one example embodiment of the invention.

At least one example embodiment of the invention further includes anencoding device, which is designed such that the encoding methodaccording to at least one example embodiment of the invention can beimplemented, and/or a decoding device, which is designed such that thedecoding method according to at least one example embodiment of theinvention can be implemented. Furthermore, at least one exampleembodiment of the invention relates to an encoding and decoding devicecomprising an inventive encoding device and an inventive decodingdevice.

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments of the invention are described below in more detailwith reference to the attached drawings, in which;

FIG. 1 shows a schematic representation of an encoding and decodingsystem, in which the encoding and decoding method according to at leastone example embodiment of the invention is implemented;

FIG. 2 shows a schematic representation of the structure of an FUU;

FIG. 3 shows an example of a syntax of an XML document, from whichinformation is to be filtered;

FIG. 4 shows an example of a filter specification for filtering outspecific information from the binary representation of the XML documentin FIG. 3; and

FIG. 5 shows an exemplary representation of an encoding configurationformatted as an XML document which can be used in the method accordingto at least one example embodiment of the invention.

DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS

FIG. 1 shows an example encoding and decoding system according to atleast one example embodiment of the invention, with an encoder ENC and adecoder DEC, with which XML documents DOC are encoded and/or decoded.Both the encoder and the decoder have a so-called scheme S in whichelements and types of the XML document used for communication aredeclared and defined.

Code tables CT are generated from the scheme S by way of correspondingscheme compilations SC in the encoder and decoder. When the XML documentDOC is encoded, the contents of the XML document are assigned binarycodes by way of the code tables.

Subsequently the codes are divided in the encoder into so-calledfragment update units FUU, which are described in more detail inrelation to FIG. 2. The division of the codes into FUU's depends on theconfiguration of the encoder. The document DOC is thus converted into anencoded binary format BDOC which is subsequently transmitted to thedecoder and then in turn decoded with the aid of the code table CT,thereby reproducing the original document DOC.

The method according to at least one example embodiment of the inventionincludes information EC concerning the division of the contents of theXML document into FUU's carried out by the encoder being transmittedprior to or in parallel with the transmission of the binaryrepresentation of the XML document.

FIG. 2 shows the components of a fragment update unit FUU, whichrepresents the binary format of an MPEG 7 description flow. A unit ofthis type contains a fragment update command, in which is specifiedwhich operation is to be carried out in one node of the description treeof an XML document. Furthermore, the unit includes a fragment updatecontext, which contains among others a so-called context path, by whichthe path in the description tree of the document is specified to thenode at which the fragment update command is to be implemented.

The context path determines which information can be maximally containedin an FUU. The FUU finally still contains the fragment update payload,i.e. the encoded information to be processed in the corresponding node.For a more precise description of the structure of an FUU, referenceshould be made to document [3].

An encoded data flow includes a plurality of fragment update units ofthis type, these FUU's being in turn combined into so-called accessunits. In the embodiment of the method according to the inventiondescribed here, in addition to the FUU's, configuration information ECis still transmitted in the encoded data flow to the decoder, theconfiguration information specifying how an XML document is divided inFUU's.

FIG. 3 reproduces an example of a content of an XML document to beencoded. The document comprises among other things four elements termedas “gBSDUnit”, two of these elements containing a so-called markerattribute. FIG. 4 shows a filter specification, according to which thedocument encoded using the method according to at least one exampleembodiment of the invention is to be filtered. The filter specificationdetermines that a context path is to be sought which contains theelement gBDSUnit with the marker attribute. In the present case, thisspecification corresponds to the bit pattern “11010”.

To filter this information from the encoded data flow with the leastpossible decoding effort, the configuration information of the encoderdisplayed in XML format in FIG. 5 is transmitted to the decoder. Thisspecifies that an access unit contains only gBSDUnits (line 4: <Nodestype=“gBsDUnit”/>). Furthermore, it is established that an access unitcontains two fragment update units, the first fragment update unitcontaining a marker attribute of a gBSDUnit in each instance (line 8:<selector ref=“./@marker”></selector>) and the second fragment updateunit containing a gBSDUnit in each instance, whereby in the case ofgBSDUnits containing marker attributes, these attributes are not storedin this fragment update unit, (line 16: <except ref“=./@marker”/>). Bytransmitting the information represented in FIG. 5 to the decoder DEC,specific marker attributes can be sought significantly faster, since:

-   -   the decoder knows that marker attributes are not contained in        FUU's containing gBSDUnits, and the gBSDUnits contained in the        fragment update payloads need not be decoded for this purpose,    -   the decoder must only decode FUU's, the context path of which        (see FIG. 4) comprises the bit pattern of a context path to a        marker attribute.

As the comparison of bit patterns can be implemented significantlyfaster than the decoding of fragment update payloads, the transmissionof the configuration information of the encoder can allow the filteringto accelerate significantly.

BIBLIOGRAPHY

[1] Text of ISO/IEC FCD 15938-1 Information Technology—MultimediaContent Description Interface—Part 1, Systems.

[2] Text of ISO/IEC CD 21000-7 Information Technology—MultimediaFramework—Part 7, Digital Item Adaptation.

[3] J. Heuer, C. Thienot, M. Wollborn, “Binary Format”, in “Introductionto MPEG-7”, Editors: B. S. Manjunath, P. Salembier, T. Sikora, JohnWiley & Sons, West Sussex, 2002, pages 61-80.

Example embodiments being thus described, it will be obvious that thesame may be varied in many ways. Such variations are not to be regardedas a departure from the spirit and scope of the present invention, andall such modifications as would be obvious to one skilled in the art areintended to be included within the scope of the following claims.

1. Method for encoding a structured document, comprising: converting thecontents of the document into a binary representation; dividing thebinary representation into encoding units, which form an encoded dataflow, readable out from the encoded data flow; and includingconfiguration data in the encoded data flow, by which configurationinformation concerning the division of the binary representation intoencoding units is readable out prior to reading out one or more encodingunits.
 2. Method according to claim 1, wherein the configurationinformation contains information concerning missing contents inpredetermined encoding units.
 3. Method according to claim 2, whereinthe encoded data flow contains references to at least one of thelocations at which the missing contents are located in the encoded dataflow, and to the encoding. units containing the missing contents. 4.Method according to claim 1, wherein the configuration data is encoded.5. Method according to claim 1, wherein the configuration data is theconfiguration information and is added to the encoded data flow. 6.Method according to claim 5, wherein the configuration information istextually encoded in the form of an XML document.
 7. Method according toclaim 5, wherein the configuration information is encoded with an MPEGencoding method.
 8. Method according to claim 1, wherein theconfiguration data includes references to configuration information,with which configuration information is selected from storedconfiguration information.
 9. Method according to claim 1, wherein thedocument is an MPEG description flow, the encoding units being fragmentupdate units which in turn form access units.
 10. Method according toclaim 8, wherein the stored configuration information is contained inprofiles of an MPEG standard
 11. Method according to claim 1, whereinthe structured document is an XML document including at least one ofelements, attributes and data types.
 12. Method according to claim 2,wherein the structured document is an XML document including at leastone of elements, attributes and data types and wherein missing contentscomprise at least one element, attribute and data type.
 13. Method fordecoding an encoded data flow, comprising decoding a data flow encodedwith a method according to claim
 1. 14. Method according to claim 13,wherein the configuration information is read out.
 15. Method forencoding and decoding a data flow, comprising encoding a data flow witha method according to claim 1 and decoding the encoded data flow. 16.Encoding device, designed to implement a method according to claim 1 17.Decoding device, designed to implement a method according to claim 13.18. Encoding and decoding device, designed to implement method accordingto claim
 15. 19. The method of claim 1, wherein the document is an XMLdocument.
 20. Method according to claim 2, wherein the configurationdata is the configuration information and is added to the encoded dataflow.
 21. Method according to claim 20, wherein the configurationinformation is textually encoded in the form of an XML document. 22.Method according to claim 20, wherein the configuration information isencoded with an MPEG encoding method.
 23. Method according to claim 1,wherein the document is at least one of an MPEG-7 and an MPEG-21description flow, the encoding units being fragment update units whichin turn form access units.
 24. Method according to claim 9, wherein thestored configuration information is contained in profiles of an MPEGstandard.
 25. Method according to claim 23, wherein the storedconfiguration information is contained in profiles of at least one ofthe MPEG-7 and the MPEG-21 standard.
 26. Device for encoding astructured document, comprising: means for converting the contents ofthe document into a binary representation; means for dividing the binaryrepresentation into encoding units, which form an encoded data flow,readable out from the encoded data flow; and means for includingconfiguration data in the encoded data flow, by which configurationinformation concerning the division of the binary representation intoencoding units is readable out prior to reading out one or more encodingunits.
 27. The device of claim 26, wherein the document is an XMLdocument.
 28. Device for encoding and decoding a structured document,comprising: means for converting the contents of the document into abinary representation; means for dividing the binary representation intoencoding units, which form an encoded data flow, readable out from theencoded data flow; means for including configuration data in the encodeddata flow, by which configuration information concerning the division ofthe binary representation into encoding units is readable out prior toreading out one or more encoding units; and means for decoding theencoded structured document.
 29. The device of claim 28, wherein thedocument is an XML document.